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.Abstract 



The purpose of this paper is twofold. First, we investigate the 
impact of concurrency control on transaction execution cost and system 
throughput in centralized and distributed data base systems (DBS) based 
on slow and fast (local) networks. Second, we show that in terms of 
transaction execution cost and CBS throughput there are some applica- 
tions for which any distributed DBS can be more effective than any cen- 
tralized CBS and vice versa. We also argue that for other applications 
the decision in favour of distributed or centralized DBS should be based 
on the comparison of specific DBS systems. 



1. Introduction. 



Distributed database systems (D-DBS) are alleged to provide numerous 
advantages over centralized database systems (C-DBS) . The usual argu- 
ments in favour of the D-DBS are: 

a. improved user attitude - the distribution of data and process- 
ing gives users greater control and autonomy over the data processing 

b. improved reliability and availability - because of the parti- 
tioning of data and replication of processors and data 

c. improved extensibility and modularity 

d. decreasing cost of hardware should make D-DES cost effective 

e. D-DBS could provide better performance and perhaps lower tran- 
saction execution cost because they have inherent concurrent execution 
capabilities not available in C-DBS 

We expect that in many applications the last consideration, i.e. 
transaction execution cost and performance, will be the principal factor 
when deciding between C-DBS and D-DES. Therefore in this paper, we 
analyze and compare the transaction execution cost and the performance 
of C-DBS and D-DBS. We are here interested in two goals. First, we 
want to investigate the importance or the impact of concurrency control 
on transaction execution cost and system throughput in C-DBS and D-DBS 
based on slow and fast (sometimes called local) networks. Second, we 
are interested in a simple and robust analysis which explains certain 
intuitive notions about the preferability or suitability of D-DBS or C- 
DBS for some applications. Our analysis is general , i.e. it is not 
meant to represent any particular concurrency control mechanism, C-DBS 
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or D— DBS. The analysis however can become specific by substituting 
proper values, representing specific concurrency control mechanisms or 
DBS systems, into the derived formulas. 



2. Previous work. 



It appears that there are no published papers dealing with compara- 
tive analysis of distributed and centralized DBS systems. However, 
recently some work has been done on the performance analysis of distri- 
buted DBS (BAD 80, MOL 79, RIE 79) and there are two publications (BUC 
79, STE 79) somewhat related to this paper. The paper by Bucci and 
Streeter (BUC 79) deals with the cost and performance analysis of a sin- 
gle processor with multiple remote terminals. The paper also contains a 
short sketch of a very simple comparative cost analysis of distributed 
and centralized DBS. The paper by Stewart (STE 79) describes a discrete 
simulation modelling tool developed for the performance analysis of IBM 
SNA system configurations. It is obvious from the paper that this per- 
formance analysis tool can be used for already implemented systems as 
the simulator input requires detailed system implementation information 
e.g., number of instructions per program, number of I/O, a priority 
interrupt mechanism, a priority dispatcher, CPU definition, etc. How- 
ever, it is indicated in the paper that the simulator could be used for 
the performance analysis of distributed DBS - presumably once it has 
been implemented. 
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3. D-OBS vs. C-DBS: Transaction Execution Cost Analysis. 



In our analysis, we consider C-DBS with remote users and a D-DBS 
with similar capabilities in which data and processing power are distri- 
buted to the remote users. Figure 1 shows both DBS configurations we 
analyze in this paper. (RU is remote user and LU is local user). 
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In this paper the cost is defined in terms of the number of exe- 
cuted instructions, the amount of generated I/O and the number of mes- 
sages, or any subset of these. We model the average cost of one tran- 
saction processing in the C-CSS as consisting of three parts as follows: 



C-DBS without a concurrency control (CC) 

*“com * s t “ e avera< ? e (communication) cost of submitting the tran- 
saction from the remote user to the C-DBS 



chronization of the transaction 

Since any transaction in any UBS is either conflicting (i.e., try- 
ing to acquire resources already acquired by some other transaction) or 
nonconflicting then Cg^ consists of two types of costs - the cost (C cc ) 
associated with nonconflicting transaction and the cost (C con fi) when 
the transactions interfere, i.e., conflict. We can express C s ^ ri as fol- 
lows: 




( 1 ) 



where 



C rS y S is the average cost of executing one transaction in the 



is the average CC cost, i.e. the average cost due to syn- 



C 



syn 



= Dl * C, 



confl + ^cc 



( 2 ) 
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where 



DI, the degree of interference, is the ratio of the number of 
conflicting transactions to the number of all transactions 

c confl t * le avera 9 e CC conflict resolution cost per conflict- 
ing transaction 

DI*C Con f^ is the average conflict resolution cost per transac- 
tion 

C cc is the average CC no-conflict cost per transaction 
Transactions in D-DBS can be divided not only into conflicting and 
nonconflicting (as in C-DBS) but they can be also divided into local and 
nonlocal or global transactions depending whether they execute at one 
site only (local transaction) or more than one site (global transac- 
tions) . Thus the cost of transaction processing in the distributed DBS 
(D-DBS) can be modeled as the cost of local and nonlocal (or global) 
transaction executions weighted by the terns reflecting the number of 
local and global transactions in the system. Then the cost of the 
transaction processing in D-DBS is: 



C. = X*(C 



Isys 



C lsyn } 



+ (1-X)*(C, 



gsyn 



C ) 
gsys' 



( 3 ) 



where 

C i sys the avera 9 e cost °f local transaction execution without 

considering concurrency control 

C^lsyn is the average CC cost per local transaction i.e., one 



which needs to access data only at one site 

C is the average cost of global transaction execution 
gsys 

without considering concurrency control 

C_ Qvn is the average CC cost per qlobal transaction, i.e. one 
gsyn 

which needs to access data at more than one site of the D-DBS 

X is the ratio of the number of local transactions to all tran- 
sactions 

1— X is the ratio of the number of global transactions to all 
transactions 

Let 

C gsys = C lsys + C data 

where 



C data is the average (communication) cost of data transfers dur- 
ing the global transaction execution 

C lsyn can father decomposed in a similar manner to in the case 

of C-DBS : 

C lsyn = DI 1 * C lconfl + C lcc 



where 

DI^ is the degree of interference of local transactions at each 
site of the D-DBS, i.e. ratio of the number of conflicting local tran- 
sactions (i.e., local transactions conflicting with local transactions 
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but not with global transactions) to the total number of local transac- 
tions 

C lcc * s t ^ ie avera 9 e no-conflict cost per local transaction 

Ciconfl * s tlVle avera 9 e CC conflict resolution cost per local 

« 

conflicting transaction. 

C can also be decomposed in similar manner as follows: 
g^yn 

‘“gsyn ~ ^gcc + ^*g * ^gconfl ^ 



where 

Cg cc t * ie ave tage CC no-conflict cost per global transaction 
Ciconfl * s the avera 9 e CC conflict resolution cost per global 
transaction 

DI_ is ratio of the number of conflicting transactions (i.e., 

9 

global transactions conflicting with global and local transactions) to 
the number of global transactions 

Substituting into (1) and (3) from (2) , (3a) , (4) , and (5) we obtain 
C c “ ''csys + c com +D ** ^confl + C cc 

C d = C lsys + X *f DI l* C lconfl + C lcc> + d-X)*(DI g *C gconfl + C gcc + 
c data } 



(*S) 



(7) 
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In order to simplify (6) and (7) we assume that the cost of synchroniz- 
ing local transactions in D-DBS is proportional to the cost of synchron- 
izing the transactions in C-DBS, i.e., we assume that: 



'“lconfl ^1* ^confl 



C lcc = V C cc 



(9) 



where ^ 1 if the following applies: 

a) processors in D-DBS can not support the same CC mechanism at 
the same cost as C-DBS, e.g., number of I/O is different, etc. 

b) D-DBS uses CC mechanism different from the one used in C-DBS 

c) when both of the above apply 

We can also assume that the cost of local transaction execution 
without synchronization in D-DBS is proportional to the cost of transac- 
tion execution without synchronization in C-DBS, i.e., we assume: 



C. = c 
lsys 2 csys 

where ^ / 1 if processors in D-DBS execute local transactions 
CC) at a different cost compared to C-DBS. 



(9) 

(without 



We also assume, somewhat arbitrarily, that the degree of global and 
local transaction interference in D-DBS is proportional to the degree of 
transaction interference in C-DBS as follows: 



DI = (1-K 3 )*DI (10) 

DI 1 ■ K 3 *DI 
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where large reflects either good D-DBS design and/or applications 
with strong locality. We feel that the above assumption, whether right 
or wrong, becomes irrelevant for applications featuring very low degrees 
of transaction interference. We note here that most of present applica- 
tions seem to be in that class. 

Finally we assume that the cost of synchronizing global transactions in 
D— DBS is proportional to the cost of synchronizing the transactions in 
C-DBS, i.e., we assume that: 



We will refer to a D-DES 'which uses slow network (slow compared to 
secondary memory channels) as a slow D-DBS for which K 0 >> 1. We will 
refer to a D-DBS which uses fast network (comparable to secondary memory 
channels ) as a fast (local) D-DBS for 'which K 0 «l. Substituting (B) , 
(9), (10) and (11) into (?) and (7) we get 



We would like to know when the cost of transaction execution is 
larger in C-DBS compared to the cost of transaction execution in D-DBS. 
Let 



C = '< *c 
gcc o cc 

C gconfl ~ ^o^confl 



( 11 ) 




( 12 ) 




(l-X)*((l-K 3 )*DI* V C confl + K 0 *C cc + C data ) 



(13) 
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( 14 ) 



Substituting from (12) and (13) into (14) we obtain 

(I-K2) * C csys + C com + d-K 1 *K 3 *X) *DI*C ronfl + (1-Kj‘X) *C cc > 
(l-X).(2*K 0 *(l-K 3 )*DI*C corl£1 * K 0 *C cc + C data ) (15) 

If Xftil, i.e. when almost all transaction in D-DBS are local 
then (15) reduces to 

d-K 2 >* c csys + C com + (l-K 1 *K 3 )*DI*C con£1 + (1-K^‘C^ > 0 (15) 

When almost all transactions are local (or equivalently when D-DBS is 
well designed) we introduce only negligible error by assuming that K3 = 
l f i.e., assuming that the degree of interference in C-DBS is the same 
as in D-DBS. Then (16) can be rewritten as 

Ccom > « 2 - L )* c csys + < K l-l>' c syn < 17 > 

Let's assume that K1 = 1, i.e. C-DBS and D-DBS execute under the same CC 
mechanism and the CC cost is the same in both. Then (17) reduces to 



'“com ^ ^ * C csys 



(13) 



If we also assume that K2 = 1, i.e. the cost of transaction execution 
without CC in D-DBS and C-DBS is the same then 
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( 19 ) 



which is always true. 

The above result says that in applications 'where (a) almost all 
transactions are local, (b) the same CC mechanism is used in C-CBS and 
D-DBS , (c) local processors in D-DBS do not impose any processing cost 
penalty compared to C-DBS processor, then the D-DBS regardless of the 
speed of its network and regardless of the degree of interference will 
result in lower transaction execution cost. This result which has been 
derived from the analysis of our model is in a complete accord with our 
intuition as it is to be expected if the model is realistic. 

We come to the same conclusion when = 1 and < 1 > (i*e* local 
processors in D-DBS do not impose any transaction processing cost 
penalty compared to C-DBS processor and D-DBS executes under different 
CC mechanism which has lower overhead cost) . 

Let's assume that K-| = K 0 , i.e. the local processors in C-DBS 
impose the same cost penalty on transactions and CC programs processing. 
Then from (17) we get 

C, > (K-,-1) * (C_._ + (20) 

com v 1 1 ' syn csys' 1 

From (20) the only scenario on "which we can make a general observa- 
tion on is when >> 1, i.e., when local processors in D— DBS impose 
heavy processing cost penalty compared to C-DBS processor. Then from 
(20) we obtain 
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( 21 ) 




The conclusion from (21) is that even if >> 1 C-DBS can still 
result in lower transaction processing cost in applications where users 
ship large amounts of data per transaction between remote terminals and 
C-DBS processors, and terminal communication lines are slow and costly 
and transactions are not computationally intensive. 

We note here that our analysis could be made specific by substitut- 
ing either measured or assumed values for the parameters in our formu- 
las. We avoid doing so as no commercial D-DBS is operational today. 
Therefore, we rather attempt to use a few reasonable assumptions so that 
we can simplify our formulas and then make aeneral observations. 



(that seems to be valid assumption for most applications) . Also assume 
that K1 = X2 = l,i.e. local processors in D-DBS do not impose any cost 
penalty on transactions and CC programs processing compared to C-DRS 
processor. Then from (15) we obtain 



Let’s assume that DI^ssO, i.e. very few transactions interfere 




( 22 ) 



For fast D-DBS X Q 1 and (22) reduces to 




(23) 



For slow D-D3S Xo » 1 and (22) reduces to 




( 24 ) 
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The observations we offer on (23) and ( 2d ) is that when comparing 
transaction execution cost in C-DBS and fast D-DBS then the CC mechanism 
problem is not very important. However, it becomes very important when 
comparing cost of transaction execution in C-DBS and slow D-DBS. 



4. C-DBS vs D-DBS: Performance Analysis. 



The second issue we want to investigate in this paper is the impact 
of concurrency control on throughput of C-DBS and D-DBS based on slow 
and fast networks. We are also interested in identifying applications 
for which we can say that any D-DBS will likely outperform any C-DRS and 
vice versa. 

The C-DBS throughput can be derived by considering the fact that 
system transaction processing rate is decreased by synchronization of 
transactions. Thus we can express C-CBS throughput C c as follows: 

«c = I S C - < s Ccc + s Ccor.fl> 1 (25 > 

where 

S 0 is the basic transaction processing rate of C-DBS which does 
not have any concurrency control 

S^ cc is the fraction of basic transaction processing rate Sr. 
used for synchronization of transactions 

S rcon fi is the fraction of S^ used for resolution of conflicts. 
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We model D-D3S as a set of n processors whose physical dependency 
due to an underlying network and whose logical and functional dependency 
due to global transactions are reflected only as a decrease of the 
throughput of every site in D-DBS. Thus, the throughput of D-DBS can be 
derived in terms of each node processing rate and its decrease due to 
the synchronization of local and global transactions as follows: 

°d = "*! S L-< S LCC * DI L* S Lconfl>l + (l-V)*fC 0 *S L - (S Gcc + DI G *S Gccinf; )J) 

?2<i) 

where 



Y is the ratio of local transactions to all transactions 
n is the number of sites or nodes in the D-DBS 
S L is the transaction processing rate of each of n nodes without 
concurrency control 

Sr cc is the fraction of S L used for the synchronization of tran- 
sactions local to one site 

DIr_ is the degree of local transaction interference, i.e., the 
ratio of local interferring transactions to all local transactions 

S Lconfl * s the f ract i° n °- s l use< 3 f° r the resolution of local 
transaction conflicts 

Sj-j cc is the fraction of used for the processing, of synchron- 
ization messages of global transactions 

S Gconfl the fraction of S L used for the resolution of global 
transaction conflicts 

DIq is the degree of global transaction interference, i.e., the 
ratio of interferring global transactions to all global transactions 
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Cq is the ratio of D-DBS average network delay to D-DBS local 
processor I/O time 



We further assume that 



S Gcc C 01* S Lcc 
s Gconfl ~ ' : '02* s Lconf 1 

where 



C 

C 



01 

02 



= C 00 
= c oo 






cc 

confl 



where 



time 



C 0 q is the ratio of average network delay to CC message set-up 



M cc is CC no-conflict overhead, i.e., the average number of mes- 
sages a given CC mechanism requires for synchronization of nonconflict- 
ing transactions 

M confl * s con ^l^ ct overhead, i.e., the averaae number of mes- 
sages a given CC mechanism requires for the resolution of transaction 
conflicts. (An example of CC conflict and no-conflict overhead analysis 
of several CC mechanisms can be found in (BAD 81) ) . 



We also assume that 



S Lconfl ~2* S Cconfl 

di l = c 3 *di 

DI g = (1-C 3 )*DI 
We are interested when 

Q c » O d (27) 

Substituting into (27) from (25) and (26) and using the above 
assumptions we obtain 

s c - <S Ccc + DI*C confl ) ^ n*{Y*[C 1 *S c - (C 2 *S Ccc + C 3 *C 2 *OI*S Ceon£1 )l + 
(1-Y)*rc 0 *c 1 *s c - (C 01 *C 2 *S C(;c + (l-C 3 )*C 02 *C 2 *CI*S COTnfl )]) (28) 

In order to simplify (28) we will assume that DI23D, C 3 = C ? and Cq 
= 1. Thus we consider applications which have very few conflicting 

transactions. We also assume that C-PBS and D-DBS either use the same 
CC mechanisms or if they are different then the decrease of local pro- 
cessor throughput is the same as if they both used the same CC mechan- 
isms (Cj = C 2 ) • Finally we assume fast D-DBS where D-DBS local proces- 
sor I/O speed is the same as D-DBS network speed (Cq = 1) . Substitutina 
these assumptions into (28) leads to 

S c (29) 

*(1 - n*C,) > 1 - n*C 1 *(C 01 + Y*(l - C Q1 ) ) 

“Ccc 

We consider three cases when 1 > n*C 3 , 1 = n*C 3 and 1 < n*C^. When 1 > 

n*Cj, i.e., S c > n*S L then (29) can be rewritten as 
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(30) 



3 C 



3 Ccc 






1 - n*C 1 *{Y + C 01 *(l - Y)) 



Hoover, for any D-DBS 



1 - n*C, 



“Ccc 



» 1 



and therefore 



(1 - n*C x *(Y + C 01 *(l - Y)) 
1 - n*C, 



>x 1 



(31) 



(32) 



As we assumed 1 - n*C| > 0 (32) reduces to 

C 01 4 1 (33) 

Thus if Sq > n*S L , then Q c > only if Cg^ ^ 1. 

When n*C^ = 1 then (29) considering (31) reduces to 

C 0 i 1 (34) 

Thus if S c = n*S L then Q„ > only if Cgj. > 1, else 0 C < 0^. 

When 1 < n*C^, i.e., when Sg < n*S L , then (29) considering (31) reduces 
to 

C 01 >> 1 (35) 

Thus if S c < n*S^, then Q c > only if Cg^ ^ 1, else 0^ > Q c . 

From (33) , (34) and (35) we can conclude that when comparing 
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throughput of C-DSS and D-DBS based on fast network then the CC mechan- 
ism (i.e. its CC overhead) and its efficient implementation (i.e. effi- 
cient message processing) are quite important. This observation is not 
entirely intuitive in the light of our assumption of very few conflict- 
ing transactions, i.e., Dls*0. Of course if there are many conflicting 
transactions one would expect CC mechanism to be important for D-DBS 
throughput. 

An interesting observation can be made on (35) . Even if D-DBS 
transaction processing rate without CC mechanism is larger than transac- 
tion processing rate without CC in C-DBS, a D-DBS with CC mechanism can 
perform worse than C-DBS with CC mechanism if C^ < 1, i.e., if either 
CC mechanism has high no-conflict overhead or it has slow CC message 
processing. 

Let's consider applications where Ys£l, i.e., there are very few 
global transactions or equivalently almost all transactions are local. 
In such case we can also assume that C 2 = 1. Substituting these assump- 
tions into (27) we obtain 



Sc** 1 - n * c l> » < s Ccc + ^‘Second’** 1 - "* c 2> <2* ) 

We consider (36) when 1 > n*C^, 1 = n*C^ and 1 < n*C^ 

When 1 > n*CT then from (36) and (31) we obtain: 

C 2 < C 1 (37) 

Thus when S c > n*S L then Q c > only when C 2 < C^, i.e., when D- 
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DBS and C-DBS use different CC mechanism or they use the same one but 
the D-DBS local processor throughput decrease due to synchronization of 
nonconflicting transactions is smaller than the decrease in local pro- 
cessor transaction processing rate compared to C-DBS transaction pro- 
cessing rate. 

'.'/hen 1 = n*Cp i.e., when S^ = n*S L then (36) always holds and thus 
<*c > Q d‘ 

'When 1 < n*C^ then from (36) and (31) we get 

C 2 > C x (38) 

Thus when < n*S L then 3^ > only if C 2 > C^ . If C 2 < then 
3^ > Q c . The implication here is that > 3 C if either D-DBS uses more 

efficient CC mechanism than C-DBS or D-DBS uses the same CC mechanism 
but its implementation is more efficient compared to D-DBS transaction 
processing, i.e., if 




As can be seen from (36) , (37) and (38) CC mechanism is a signifi- 
cant issue when comparing performance of C-DBS and D-DBS systems, ''’ore- 
over it is important even for the performance of D-DBS based on fast 
network and for applications 'where either there are few interferring 
transactions or 'where there are few global transactions. 

5. Conclusions. 
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In this paper we have investigated transaction execution cost and 
system throughput in C-DBS compared to D-OBS systems based on slow and 
fast networks. The conclusions reached in this paper indicate that the 
efficiency of CC mechanism is of great importance when comparing C-DBS 
and fast or slow C-DBS throughput. The same observation applies 'when 
comparing transaction execution cost in C-DBS and slow D-DBS. It seems 
that CC mechanism is not important when comparing the cost of transac- 
tion execution in C-DBS and fast C-DBS. 

In this paper we have also indicated for which applications any D- 
DBS is likely to be a better solution than any C-DBS and vice versa. 
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